linear equation
Optimal Tagging with Markov Chain Optimization
Many information systems use tags and keywords to describe and annotate content. These allow for efficient organization and categorization of items, as well as facilitate relevant search queries. As such, the selected set of tags for an item can have a considerable effect on the volume of traffic that eventually reaches an item. In tagging systems where tags are exclusively chosen by an item's owner, who in turn is interested in maximizing traffic, a principled approach for assigning tags can prove valuable. In this paper we introduce the problem of optimal tagging, where the task is to choose a subset of tags for a new item such that the probability of browsing users reaching that item is maximized.
Can Vision-Language Models Solve Visual Math Equations?
Choudhury, Monjoy Narayan, Wang, Junling, Hou, Yifan, Sachan, Mrinmaya
Despite strong performance in visual understanding and language-based reasoning, Vision-Language Models (VLMs) struggle with tasks requiring integrated perception and symbolic computation. We study this limitation through visual equation solving, where mathematical equations are embedded in images, variables are represented by object icons, and coefficients must be inferred by counting. While VLMs perform well on textual equations, they fail on visually grounded counterparts. To understand this gap, we decompose the task into coefficient counting and variable recognition, and find that counting is the primary bottleneck, even when recognition is accurate. We also observe that composing recognition and reasoning introduces additional errors, highlighting challenges in multi-step visual reasoning. Finally, as equation complexity increases, symbolic reasoning itself becomes a limiting factor. These findings reveal key weaknesses in current VLMs and point toward future improvements in visually grounded mathematical reasoning.
No-Knowledge Alarms for Misaligned LLMs-as-Judges
If we use LLMs as judges to evaluate the complex decisions of other LLMs, who or what monitors the judges? Infinite monitoring chains are inevitable whenever we do not know the ground truth of the decisions by experts and we do not want to trust them. One way to ameliorate our evaluation uncertainty is to exploit the use of logical consistency between disagreeing experts. By observing how LLM judges agree and disagree while grading other LLMs, we can compute the only possible evaluations of their grading ability. For example, if two LLM judges disagree on which tasks a third one completed correctly, they cannot both be 100\% correct in their judgments. This logic can be formalized as a Linear Programming problem in the space of integer response counts for any finite test. We use it here to develop no-knowledge alarms for misaligned LLM judges. The alarms can detect, with no false positives, that at least one member or more of an ensemble of judges are violating a user specified grading ability requirement.
A Parallelizable Approach for Characterizing NE in Zero-Sum Games After a Linear Number of Iterations of Gradient Descent
We study online optimization methods for zero-sum games, a fundamental problem in adversarial learning in machine learning, economics, and many other domains. Traditional methods approximate Nash equilibria (NE) using either regret-based methods (time-average convergence) or contraction-map-based methods (last-iterate convergence). We propose a new method based on Hamiltonian dynamics in physics and prove that it can characterize the set of NE in a finite (linear) number of iterations of alternating gradient descent in the unbounded setting, modulo degeneracy, a first in online optimization. Unlike standard methods for computing NE, our proposed approach can be parallelized and works with arbitrary learning rates, both firsts in algorithmic game theory. Experimentally, we support our results by showing our approach drastically outperforms standard methods.
Fast and Flexible Quantum-Inspired Differential Equation Solvers with Data Integration
Arenstein, Lucas, Mikkelsen, Martin, Kastoryano, Michael
Accurately solving high-dimensional partial differential equations (PDEs) remains a central challenge in computational mathematics. Traditional numerical methods, while effective in low-dimensional settings or on coarse grids, often struggle to deliver the precision required in practical applications. Recent machine learning-based approaches offer flexibility but frequently fall short in terms of accuracy and reliability, particularly in industrial contexts. In this work, we explore a quantum-inspired method based on quantized tensor trains (QTT), enabling efficient and accurate solutions to PDEs in a variety of challenging scenarios. Through several representative examples, we demonstrate that the QTT approach can achieve logarithmic scaling in both memory and computational cost for linear and nonlinear PDEs. Additionally, we introduce a novel technique for data-driven learning within the quantum-inspired framework, combining the adaptability of neural networks with enhanced accuracy and reduced training time.
An extension of linear self-attention for in-context learning
In-context learning is a remarkable property of transformers and has been the focus of recent research. An attention mechanism is a key component in transformers, in which an attention matrix encodes relationships between words in a sentence and is used as weights for words in a sentence. This mechanism is effective for capturing language representations. However, it is questionable whether naive self-attention is suitable for in-context learning in general tasks, since the computation implemented by self-attention is somewhat restrictive in terms of matrix multiplication. In fact, we may need appropriate input form designs when considering heuristic implementations of computational algorithms. In this paper, in case of linear self-attention, we extend it by introducing a bias matrix in addition to a weight matrix for an input. Despite the simple extension, the extended linear self-attention can output any constant matrix, input matrix and multiplications of two or three matrices in the input. Note that the second property implies that it can be a skip connection. Therefore, flexible matrix manipulations can be implemented by connecting the extended linear self-attention components. As an example of implementation using the extended linear self-attention, we show a heuristic construction of a batch-type gradient descent of ridge regression under a reasonable input form.
The impact of allocation strategies in subset learning on the expressive power of neural networks
Schlisselberg, Ofir, Darshan, Ran
In traditional machine learning, models are defined by a set of parameters, which are optimized to perform specific tasks. In neural networks, these parameters correspond to the synaptic weights. However, in reality, it is often infeasible to control or update all weights. This challenge is not limited to artificial networks but extends to biological networks, such as the brain, where the extent of distributed synaptic weight modification during learning remains unclear. Motivated by these insights, we theoretically investigate how different allocations of a fixed number of learnable weights influence the capacity of neural networks. Using a teacher-student setup, we introduce a benchmark to quantify the expressivity associated with each allocation. We establish conditions under which allocations have maximal or minimal expressive power in linear recurrent neural networks and linear multi-layer feedforward networks. For suboptimal allocations, we propose heuristic principles to estimate their expressivity. These principles extend to shallow ReLU networks as well. Finally, we validate our theoretical findings with empirical experiments. Our results emphasize the critical role of strategically distributing learnable weights across the network, showing that a more widespread allocation generally enhances the network's expressive power.